Entry
Name: "SJU-Yeon-MC2"
VAST
Challenge 2015
Mini-Challenge 2
Team
Members:
Hanbyul Yeon, Sejong University, hbyeon109@sju.ac.kr, PRIMARY
Seokyeon Kim, Sejong
University, ksy0586@sju.ac.kr
Mingyu Pi, Sejong
University, pmg9405@sju.ac.kr
Sangbong Yoo,
Sejong University, usangbong@sju.ac.kr
Yun Jang, Sejong University, jangy@sejong.edu
Student Team: Yes
Did you use data from both
mini-challenges? No
Analytic
Tools Used:
Gephi, http://gephi.github.io/
Tableau, http://www.tableau.com/
Visual Analytics framework developed in Sejong
University.
Approximately how many
hours were spent working on this submission in total?
2
weeks
May we post your submission
in the Visual Analytics Benchmark Repository after VAST Challenge 2015 is
complete? Yes
Video Download
Video:
http://vis.sejong.ac.kr/sju-yeon-mc2-video.wmv
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Questions
MC2.1 – Identify
those IDs that stand out for their large volumes of
communication. For each of these IDs
a. Characterize
the communication patterns you see.
b. Based
on these patterns, what do you hypothesize about these IDs?
Limit your response to no more than 4 images and 300 words.
The IDs that stand
out for the large volumes of commutation are ID 1278894 and ID 839736. Figure
1-1 presents the number of communication counts by senders in (a) and receivers
in (b). As seen in the figure, we can identify that those two IDs produce a
large volumes of communication by looking at the number of communication counts
encoded in the length of bars. In order to characterize the communication
patterns, we plot the number of communication counts over time in Figure 1-2
and 1-3. Figure 1-2 shows the communication patterns between ID 1278894 and
other visitors, whereas, Figure 1-3 presents those between ID 839736 and other
visitors. As shown in Figure 1-2, ID 1278894 sent something regularly every 5
minutes for an hour and received something from visitors between send-time
stamps. On the other hand, ID 839736 sent and received something simultaneously
and sending patterns followed the receiving patterns more or less as ID 839736
and other visitors were in conversations. Based on these patterns, we
hypothesize that ID 1278894 was an administrator who was in charge of events by
sending and receiving event-related information, and ID 839736 was in charge of
hotlines, such as Q&A, waiting time, directly communicating with other
visitors.
Figure 1-1 (a)
Communication counts
by senders. (b) Communication counts by receivers.
Figure 1-2 Communication pattern between ID 1278894 and visitors.
Figure 1-3 Communication pattern between ID 839736 and visitors,
MC2.2 – Describe up to 10
communications patterns in the data. Characterize who is communicating, with
whom, when and where. If you have more than 10 patterns to report,
please prioritize those patterns that are most likely to relate to the crime.
Limit
your response to no more than 10 images and 1000 words.
In order to differentiate the communication patterns,
we use modularity that is a measure of the structure of networks. The
modularity gives hints to divide networks into several modules, such as groups,
clusters, or communities. For a given division of the networks¡¯ vertices into modules, the modularity reflects the concentration of
edges within modules compared with random distribution of links between all
nodes regardless of modules. We computed the modularity using Gephi which is an open graph viz
platform. Figure 2-1 presents a visualization of the communication patterns
according to the modularity. As seen in the figure many modules are
differentiated by node locations, colors, layouts. Since we assume that the
vandalism was discovered on Sunday, we show the entire communication network on
Sunday. We further investigate this network visualization according to
patterns. Note that we found eight different patterns from the communication
network.
Figure 2-1 Overview of communication network.
Figure 2-2 (a) Pattern 1: Communication pattern with the park service (ID
839736). (b) Pattern 2: Communication
pattern with the park service (ID 1278894) and the external node.
Pattern 1 is shown in Figure 2-2 (a). This pattern
represents all communication networks connected to the park service, ID 839736.
We marked the park service ID in the figure. Since majority of the
communication is related to the park service, the communication pattern is
located in the center of the entire network. Most of nodes in this network are
connected to the park service.
Pattern 2 is presented in Figure 2-2 (b). This pattern
is the communications with the park service, ID 1278894, and the external node.
Since there are many communications to ID 1278894 and the external node, the
network is placed in the center of the entire network visualization. The main
difference between Pattern 1 and Pattern 2 is that Pattern 1 is more irregular
compared to Pattern 2 over time on Sunday. The counts of the communications
with ID 839736 varied dynamically over time on Sunday due to the vandalism,
whereas, the counts of the communications with ID 1278894 and the external
varied slightly, which was not a major change.
Figure 2-3 (a) Pattern 3: Communications
occurred between people in the pattern group. (b) Pattern
4: Different communications between the group and parker service
or the external.
Pattern 3 is classified as shown in Figure 2-3 (a) and
the communications occurred between people in the pattern group. Since there is
no specific regular pattern in this group, the nodes are located more or less
randomly.
Pattern 4 represents three different communications
including between the group and the park service or the external, and between
the group and a certain person, and between people in the group. This is a
regular pattern as seen in Figure 2-3 (b)
Figure 2-4 (a) Pattern 5: Communication networks only with park service and between people in the group. (b) Pattern 6: Small
communication networks
with 4 people.
Figure 2-5 (a) Pattern 7: Small communication networks with 3 people. (b) Pattern 8: Small
communication networks
with 2 people.
Pattern 5 is similar to Pattern 4 but there are
communications only between the group and the park service, and between people
in the group. This is shown in Figure 2-4 (a).
Pattern 6, 7, 8 are small communication networks and
the groups contain the small number of people. The group of 4, 3, 2 people are
presented in Figure 2-4 (b), 2-5 (a), 2-5 (b), respectively.
In order to analyze the communication patterns over
time, we plot the counts of the communications separated by the locations.
Figure 2-6 Communication patterns of Pattern 2.
Figure 2-6 shows the all records included in Pattern
2. Any abnormal pattern is not visible in this figure. Although we do not show the
graph for Pattern 1 here, the graph is similar to Figure 1-3.
Figure 2-7 Communication patterns of Pattern 3.
Figure 2-7 presents the communication patterns of
Pattern 3. Although the graph patterns do not seem to contain any abnormal
patterns, there are spikes whenever Scott Jones¡¯ showcase
in Wet Land and Coaster Alley. This might indicate that many fan of his are
grouped in this Pattern 3. Also this group was visible every day (Friday,
Saturday, Sunday).
Figure 2-8 Communication pattern of Pattern 4.
Figure 2-8 represents the communication patterns of
Pattern 4. As seen in the figure, the group in Pattern 4 was only visible on
specific day, which was Sunday in this figure. We guess that this group
includes normal park visitors and they were reacting a lot in the communication
right after the vandalism was discovered in Wet Land.
Figure 2-9 Communication pattern of Pattern 7 – Example 1.
As we mentioned earlier, Pattern 6, 7, 8 are sort of
similar but different numbers of people in the group. We extracted one group
from the Pattern 7 and plotted the communication patterns in Figure 2-9. People
in this group seemed to stay all day long on Saturday and Sunday and we do not
find any abnormal communication pattern.
Figure 2-10 Communication pattern of Pattern 7 – Example 2.
Similar to Figure 2-9, we picked another group from
Pattern 7 as presented in Figure 2-10. We suspect that people in this group
were related to the vandalism. First they came directly to Web Land in the
morning on Saturday and then they left. Again they came back on Sunday and
stayed in Wet Land when the vandalism was discovered. It seems like they
prepared and practiced the crime on Saturday, and then they executed it on
Sunday.
MC2.3 – From
this data, can you hypothesize when the crime was
discovered? Describe your rationale.
Limit
your response to no more than 3 images and 300 words.
We hypothesize that the crime was discovered between 11:30am and
11:45am on Sunday. In order to derive our hypothesis, we first computed
entropies of all people within the communications to extract irregularities in
the communication patterns. Then we counted the number of people whose
entropies were greater than 0.5 and plotted in Figure 3-1 (a). In this figure,
we can find that there is severe irregularity between 11am and 12pm on Sunday.
Assuming the vandalism was discovered around that time, we further investigated
the data to guess where the vandalism was found by using X-Y plots for entropy
vs. the number of communications in Figure 3-1 (b-e). Note that we
differentiated the location information by color. There is no outstanding
pattern in (b-d) but many people in Wet Land produced lots of communications as
seen in (e). Now we need to confirm our hypothesis using different datasets. In
terms of the time when the vandalism was discovered, we searched for abnormal
communication patterns of ID 879736 who was indicated as a Q&A hotline. We
compared the number of communications by all visitors and the number of
communication by ID 879736 in Figure 3-2. The patterns for the visitors are
more or less similar every day but the pattern for ID 879736 is very different
around noon on Sunday. This implies that lots of answering was needed right
after discovering the vandalism. We also plotted the number of communications
over time according to the locations excluding the park service (ID 839736 and
ID 1278894) in Figure 3-3. As seen in the figure, we can guess that Scott Jones¡¯ showcase show happened at 11AM and 4PM every
day in Coaster Alley and there was the vandalism discovered between 11:33am and
11:41am on Sunday in Wet Land.
Figure 3-1 (a) Number of people whose entropies were greater
than 0.5.
(b)- (e)
X-Y plots for entropy vs. the number of communications.
Figure 3-2 Communication counts. One (blue) includes entire communications and the other (dark grey)
is only between park service (ID 839736) and visitors.
Figure 3-3 Communication trends by
locations excluding the park service (ID 839736 and 1278894).